Boost K-Means

نویسندگان

  • Wanlei Zhao
  • Cheng-Hao Deng
  • Chong-Wah Ngo
چکیده

Due to its simplicity and versatility, k-means remains popular since it was proposed three decades ago. Since then, continuous efforts have been taken to enhance its performance. Unfortunately, a good trade-off between quality and efficiency is hardly reached. In this paper, a novel k-means variant is presented. Different from most of k-means variants, the clustering procedure is explicitly driven by an objective function, which is feasible for the whole l2-space. The classic egg-chicken loop in k-means has been simplified to a pure stochastic optimization procedure. K-means therefore becomes simpler, faster and better. The effectiveness of this new variant has been studied extensively in different contexts, such as document clustering, nearest neighbor search and image clustering. Superior performance is observed across different scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

K-Boost: A Scalable Algorithm for High-Quality Clustering of Microarray Gene Expression Data

Microarray technology for profiling gene expression levels is a popular tool in modern biological research. Applications range from tissue classification to the detection of metabolic networks, from drug discovery to time-critical personalized medicine. Given the increase in size and complexity of the data sets produced, their analysis is becoming problematic in terms of time/quality trade-offs...

متن کامل

A clustering method based on boosting

It is widely recognized that the boosting methodology provides superior results for classification problems. In this paper, we propose the boost-clustering algorithm which constitutes a novel clustering methodology that exploits the general principles of boosting in order to provide a consistent partitioning of a dataset. The boost-clustering algorithm is a multi-clustering method. At each boos...

متن کامل

Complex Scene Analysis in Urban Areas Based on an Ensemble Clustering Method Applied on Lidar Data

3D object extraction is one of the main interests and has lots of applications in photogrammetry and computer vision. In recent years, airborne laser-scanning has been accepted as an effective 3D data collection technique for extracting spatial object models such as digital terrain models (DTM) and building models. Data clustering, also known as unsupervised learning is one of the key technique...

متن کامل

Boosting the Performances of the Recurrent Neural Network by the Fuzzy Min-Max

The k-means training algorithm used for the RBF (Radial Basis Function) neural network can have some weakness like empty clusters, the choice of the cluster number and the random choice of the centers of theses clusters. In this paper, we use the Fuzzy Min Max technique to boost the performances of the training algorithm. This technique is used to determine the number of the k centers and to in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1610.02483  شماره 

صفحات  -

تاریخ انتشار 2016